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Abstract 

A general theoretical framework is presented for analyzing information transmission over Gaussian 
channels with memoryless transceiver distortion, which encompasses various nonhnear distortion models 
including transmit-side clipping, receive-side analog-to-digital conversion, and others. The framework is 
based on the so-called generalized mutual information (GMI), and the analysis in particular benefits from 
the setup of Gaussian codebook ensemble and nearest-neighbor decoding, for which it is estabhshed that 
the GMI takes a general form analogous to the channel capacity of undistorted Gaussian channels, with 
a reduced "effective" signal-to-noise ratio (SNR) that depends on the nominal SNR and the distortion 
model. When applied to specific distortion models, an array of results of engineering relevance is 
obtained. For channels with transmit-side distortion only, it is shown that a conventional approach, 
which treats the distorted signal as the sum of the original signal part and a uncorrected distortion part, 
achieves the GMI. For channels with output quantization, closed-form expressions are obtained for the 
effective SNR and the GMI, and related optimization problems are formulated and solved for quantizer 
design. Finally, super-Nyquist sampling is analyzed within the general framework, and it is shown that 
sampling beyond the Nyquist rate increases the GMI for all SNR. For example, with a binary symmetric 
output quantization, information rates exceeding one bit per channel use are achievable by sampling the 
output at four times the Nyquist rate. 
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I. Introduction 

In digital communication systems, various forms of distortion are ubiquitous, acting as the main 
limiting factor for information transmission. Those distortions that come with the propagation 
of signal, such as shadowing and multipath fading, have received extensive research since the 
earliest era of digital communications [1]. The current paper, alternatively, concerns with the other 
category of distortions that come mainly with the engineering of transceivers. This category of 
distortions encompasses a number of models of practical importance, including the clipping or 
saturation of transmitted waveforms due to power amplifier nonlinearity, the analog-to-digital 
conversion (i.e., quantization) of received samples, and others. Such distortions are difficult to 
eliminate, and indeed people may deliberately introduce them, for practical reasons like hardware 
cost reduction and energy efficiency improvement. 

We can usually approximate the aforementioned transceiver distortions as memoryless de- 
terministic functions. Those functions, however, are generally nonlinear operations and thus 
break down the linearity in Gaussian channels. From a pure information-theoretic perspective, 
nonlinearity may not impose fundamental difficulty to our conceptual understanding, since the 
channel capacity is still the maximum of mutual information between the channel input and 
the distorted channel output. From an engineering perspective, however, the general mutual 
information maximization problem is usually less satisfactory in generating insights, especially 
when such maximization problems are analytically difficult, or even intractable, for general 
nonlinear channel models. 

There are a number of existing works that seek to characterize the information-theoretic 
behavior of nonlinear transceiver distortion, largely scattered in the literature. In [2], the authors 
examined the channel capacity of clipped orthogonal frequency-division multiplexing (OFDM) 
systems, with the key approximation that the distortion due to clipping acts as an additional 
Gaussian noise. Such an approximation originates from a theorem due to Bussgang [3], which 
implies that the output process of a Gaussian input process through a memoryless distortion 
device is the sum of a scaled input process and a distortion process which is uncorrelated with 
the input process. Regarding Nyquist-sampled real Gaussian channels with output quantization, 
an earlier study [4] examined the achievable mutual information as the signal-to-noise ratio 
(SNR) decreases toward zero. Specifically, the numerical study therein revealed that for a binary 
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symmetric output quantizer, the ratio between the capacity per channel use (c.u.) and the SNR 
approaches I/tt, and that for a uniform octal (i.e., 8-level) output quantizer, this ratio is no less 
than 0.475. In [5], the authors further established some general results for Nyquist-sampled real 
Gaussian channels, asserting that with a X-level output quantization, the capacity is achieved by 
choosing no more than (K+l) input levels, and that with a binary symmetric output quantization 
the capacity is indeed achieved by using a binary symmetric input distribution. For K > 2, 
however, it is necessary to use intensive numerical methods like the cutting-plane algorithm to 
compute the capacity. The authors of [6] addressed the capacity of multiple-input-multiple-output 
block-Ray leigh fading channels with binary symmetric output quantization. In [7], the authors 
went beyond the Nyquist-sampled channel model, demonstrating that the low-SNR capacity of 
a real Gaussian channel with binary symmetric output quantization, when sampled at twice the 
Nyquist rate, is higher than that when sampled at the Nyquist rate. In [8], the authors proved 
that by using a binary asymmetric output quantizer design, it is possible to achieve the low-SNR 
asymptotic capacity without output quantization. 

Recognizing the challenge in working with channel capacity directly, we take an alterna- 
tive route that seeks to characterize achievable information rates for certain specific encod- 
ing/decoding scheme. As the starting point of our study, in the current paper we consider a 
real Gaussian channel with general transceiver distortion, and focus on the Gaussian codebook 
ensemble and the nearest-neighbor decoding rule. We use the so-called generalized mutual 
information (GMI) [9], [10] to characterize the achievable information rate. As a performance 
measure for mismatched decoding, GMI has proved convenient and useful in several other 
scenarios such as multipath fading channels [10]. Herein, in our exercise with GMI, we aim 
at providing key engineering insights into the understanding and design of transceivers with 
nonlinearity. The nature of our approach is somewhat similar to that of [11], where the authors 
addressed the decoder design with a finite resolution constraint, using a performance metric akin 
to cutoff rate that also derives from a random-coding argument. 

The motivation for using the performance measure of GMI and the Gaussian codebook 
ensemble coupled with the nearest-neighbor decoding is two-fold. On one hand, such an approach 
enables us to obtain an array of analytical results that are both convenient and insightful, 
and bears an "operational" meaning in that the resulting GMI is achievable, by the specific 
encoding/decoding scheme whose implementation does not heavily depend on the nonlinear 
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distortion model. On the other hand, Gaussian codebook ensemble is a reasonable model for 
approximating the transmitted signals in many modern communication systems, in particular, 
those that employ higher-order modulation or multicarrier techniques like OFDlvIl; and the 
nearest-neighbor decoding rule is also a frequently encountered solution in practice which is 
usually easier to implement than maximum-likelihood decoding, for channels with nonlinear 
characteristics. Nevertheless, we need to keep in mind that compared with capacity, the perfor- 
mance loss of GMI due to the inherently suboptimal encoding/decoding scheme used may not 
be negligible. 

The central result in the current paper is a GMI formula, taking the form of (1/2) log(l + 
SNRe), for real Gaussian channel with general transceiver distortion. Here SNRe depends on 
the nominal SNR and the transceiver nonlinearity, and we may interpret it as the "effective SNR", 
due to its apparent similarity with the role of SNR in the capacity formula for Gaussian channels 
without distortion. The parameter SNRg thus serves as a single-valued performance indicator, 
based on which we can, in a unified fashion, analyze the behavior of given transceivers, compare 
different distortion models, and optimize transceiver design. 

Applying the aforementioned general GMI formula to specific distortion models, we obtain 
an array of results that are of engineering relevance. First, when the nonlinear distortion occurs 
at the transmitter only, we show that the Bussgang decomposition, which represents a received 
signal as the sum of a scaled input signal part and a distortion part which is uncorrelated with 
the input signal, is consistent with the GMI-maximizing nearest-neighbor decoding rule. This 
result validates the Gaussian clipping noise approximation for transmit-side clipping, as followed 
by the authors of [2]. 

Second, we evaluate the GMI for Nyquist-sampled channels with output quantization. For 
binary symmetric quantization, we find that the low-SNR asymptotic GMI coincides with the 
channel capacity. This observation is somewhat surprising, since the GMI is with respect to a 
suboptimal input distribution, namely the Gaussian codebook ensemble. On the other hand, there 
exists a gap between high-SNR asymptotic GMI and the channel capacity, revealing the penalty 

'in the current paper we confine ourselves to the single-carrier real Gaussian channel model, and will treat multicarrier 
transmission with nonlinear distortion in a separate work. 

^For complex Gaussian channels we also have an analogous result; see Supplementary Material I VII-CI 
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of suboptimal input distribution when the effect of noise is negligible. For symmetric quantizers 
with more than two quantization levels, we formulate a quantizer optimization problem that 
yields the maximum GMI, and present numerical results for uniform and optimized quantizers. 
As an example of our results, we show that for octal quantizers, the low-SNR asymptotic GMI 
is higher than the known lower bound of channel capacity in the literature [4]. 

Finally, we explore the benefit of super-Nyquist sampling. Considering a real Gaussian channel 
with a bandlimited pulse-shaping function and with general memoryless output distortion, we 
obtain a formula for its GMI, when the channel output is uniformly sampled at L times the 
Nyquist rate. We then particularize to the case of binary symmetric output quantization. We 
demonstrate through numerical evaluation that super-Nyquist sampling leads to benefit in terms 
of increased GMI over all SNR, for different values of L. In the low-SNR regime, the asymptotic 
GMI we obtain for L = 2 with a carefully chosen pulse-shaping function almost coincides with 
the known lower bound of channel capacity in the literature [7]. In the high-SNR regime, we make 
an interesting observation that, when the sampling rate is sufficiently high, the GMI becomes 
greater than one bit/c.u.. At first glance, this result is surprising since the output quantization 
is binary; however, it is in fact reasonable, because for each channel input symbol, there are 
multiple binary output symbols due to super-Nyquist sampling, and the amount of information 
carried by the Gaussian codebook ensemble exceeds one bit per input symbol. 

We organize the remaining part of the paper as follows. Section HI] describes the general 
Nyquist-sampled channel model and establishes the general GMI formula. Section UlI] treats the 
scenario where only transmit-side distortion exists, revisiting the well-known decomposition of 
Bussgang's theorem. Section |IV] treats the channel model with binary symmetric output quantiza- 
tion. Section|V]treats symmetric output quantizers with more than two quantization levels. Section 
rvTl explores the benefit of super-Nyquist sampling. Finally Section IVIII concludes the paper. 
Auxiliary technical derivations and other supporting results are archived in the Supplementary 
Material. 

II. General Framework for Real- Valued Nyquist-Sampled Channels 

With Nyquist sampling, it loses no generality to consider a discrete-time channel model, with 
a sequence {Z^} of independent and identically distributed (i.i.d.) real Gaussian noise, i.e., 
Z. ~ IN"(0,(T^). The channel input symbols constitute a sequence {X^}. Without distortion, the 
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received signal is Y. = X. + Z.. However, the distortion may affect both the channel input and 
the channel output. A memory less distortion, in general form, is a deterministic mapping /(■), 
which transforms a pair of channel input symbol and noise sample {x, z) into a real number 
f{x,z). Hence the channel observation at the decoder is 

Wfc = /(Xfe,Zfc), for A; = 1,2,..., n, (1) 

where n denotes the codeword length; see the illustration in Figure \\\ We note that, such a 
form of distortion can describe the case where the channel output Y. = X. + Z. is distorted, i.e., 
w = f{x,z) = fo{x + z), or the case where the channel input X. is distorted by the transmitter, 
i.e., w = f{x,z) = fi{x) + z, or the case where both input and output are distorted, i.e., 
w = f{x,z) = foifiix) + z). 

For transmission, the source selects a message M from M = {1,2, ... , [e'^^J} uniformly 
randomly, and maps the selected message to a transmitted codeword, which is a length-n real 
sequence, {Xfc(M)}^^]^. We restrict the codebook to be an i.i.d. [N"(0, £s) ensemble. That is, 
each codeword is a sequence of n i.i.d. IN"(0, £s) random variables, and all the codewords are 
mutually independent. Such a choice of codebook ensemble satisfies the average power constraint 
^ ELi EX2()Vl) < We thus define the nominal SNR as SNR = E./ct^. 

As is well known, when transceiver distortion is absent {i.e., w = y), as the codeword 
length n grows without bound, the Gaussian codebook ensemble achieves the capacity of the 
channel, ^ log(l + SNR). In the following, we proceed to investigate the GMI when the channel 
experiences the memory less nonlinear distortion /(■). 

To proceed, we restrict the decoder to follow a nearest-neighbor rule, which, upon observing 
{wk}^^i, computes for all possible messages, the distance metric, 

1 " 

D{m) = — [wk — axkim)]^ , m G M, (2) 

k=l 

and decides the received message as m = argmin^gM D{m). In ([2]), the parameter a is selected 
appropriately for optimizing the decoding performance. We note that, the nearest-neighbor de- 
coder (with a = 1) coincides with the optimal (maximum-likelihood) decoder in the absence of 
distortion, but is in general suboptimal (mismatched) for the distorted channel ([U). 

In the subsequent development in this section, we characterize an achievable rate which 
guarantees that the average probability of decoding error decreases to zero as n — > oo, for 
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Gaussian codebook ensemble and the nearest-neighbor decoding rule, following the argument 
used in [10]. When we consider the average probability of decoding error averaged over both 
the message set and the Gaussian codebook ensemble, due to the symmetry in the codebook, it 
suffices to condition upon the scenario where the message m = 1 is selected for transmission. 
With m = 1, we have 

lim D{1) = lim - V [W,. - aXfc(l)]=^ = lim - V [/(X^, Z^) - aX,(l)]' 

n— >-oo n->oo TL n^oo Ti '—^ 

k=l k=l 

= E{[/(X,Z)-aX]'} a.s. (3) 



where X ~ X(0, £s) and Z ~ [N"(0, a^), from the law of large numbers. 
The exponent of the probability of decoding error is the GMI, given by 



^GMi = sup (^E {[/(X, Z) - aXY} - Aie)) , (4) 

6»<0 



where 



A(^) = lim -An{ne), (5) 

n^oD n 

K{ne) = logE{e"''^('")| Wfc,A; = l,...,n}, Vm ^ 1. (6) 

From Chernoff's bound and the union upper bounding technique, we see that as long as the 
information rate is less than /gmi, the average probability of decoding error decreases to zero as 
n — 7- oo. Therefore, the GMI serves as a reasonable lower bound for the achievable information 
rate for a given codebook ensemble and a given decoding rule. 

After the mathematical manipulation given in Supplementary Material [VII- A[ we establish the 
following result. 

Proposition 1: With Gaussian codebook ensemble and nearest-neighbor decoding, the GMI 
of the distorted channel ([I]) is 

/GMi = ^logU + Y^J , (7) 



where the parameter A is 



2 



^ {E[/(X,Z)X]}- 
£,E[/(X,Z)]2 • 

The corresponding optimal choice of the decoding scaling parameter a is Oopt = E [/(X, Z)X] / Zg. 
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We readily see that A is the squared correlated coefficient between the channel input X and 
the distorted channel output /(X, Z), which is upper bounded by one, from Cauchy-Schwartz 
inequality. A larger value of A corresponds to a higher effective SNR. 

When contrasted with the capacity of the undistorted channel, ^ log(l + SNR), we can define 
the effective SNR of the distorted channel as SNRg = 

As an immediate verification, consider the undistorted channel W. = X. + Z., for which we 
have A = £s/(£s + o"^)- Consequently, the effective SNR is SNRg = £s/cr^, leading to the 
capacity of the undistorted channel. 

It is perhaps worth noting that, the derivation of the GMI in fact does not require Z. be 
Gaussian. Indeed, as long as {Z^} is an ergodic process and is independent of {X^}, the general 
result of Proposition [H holds. However, for simplicity, in the current paper we confine ourselves 
to i.i.d. Gaussian noise, and do not pursue this issue further. 

Remark on Antipodal Codebook Ensemble: The foregoing analysis of GMI applies to any input 
distribution. Here, consider antipodal inputs, i.e., Xfe(m) takes ^/Zl and — -\/£7 with probability 
1/2, respectively. All the codeword symbols are mutually independent. Again, we consider a 
nearest-neighbor decoding rule, with distance metric in form of ©. Following the same line of 
analysis as that for the Gaussian codebook ensemble, we have 



/gmi = sup (tE[X/(X,Z)] -Elogcosh(tv^/(X,Z))) , (9) 
and the optimal value of t should satisfy 



E 



£ J(X, Z) ■ tanh(tv/£s/(X, Z)) = E[X/(X, Z)]. (10) 



Supplementary Material [VII-BI The evaluation of the GMI is usually more difficult than that for 
the Gaussian codebook ensemble. 

III. Channels with Transmit-Side Distortion: Bussgang Revisited 

In this section, we briefly consider the scenario where only the channel input is distorted, 
i.e., w = fi{x) + z. Since X and Z are independent, the optimal choice of the decoding scaling 
parameter becomes 

_ E[(/.(X)+Z)X] _ E[X/.(X)] 

Oopt — p — p • \^^) 
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The resulting value of A is 



I 2 



£.(E|/,(Xp + a^)' ^ ' 



and the effective SNR is 

A _ {E[X/.(X)]} 
1-A £aE[/,(X)P + a2)-{E[X/,(X)]} 
Inspecting a^pt in (fTTI) . we notice that it leads to the following decomposition of /i(X): 



SNRe = = ^ '2 2 - (13) 



/,(X) =aoptX + V, (14) 

where the distortion V is uncorrelated with the input X. Recalling the Bussgang decomposition 
[2], we conclude that, when there is only transmit-side distortion, the optimal decoding scaling 
parameter in the nearest-neighbor decoding rule coincides with that suggested by Bussgang's 
theorem. Note that this conclusion does not hold in general when receive-side distortion exists. 

IV. Channels with Binary Symmetric Output Quantization 

In this section, we consider the scenario where the channel output Y = X + Z passes through a 
binary symmetric hard-limiter to retain its sign information only. This is also called one-bit/mono- 
bit quantization/analog-to-digital conversion, and we can write it as w = f{x, z) = sgn(x + z). 

For this scenario, we have 

, {E[X-sgn(X + Z)]f ^ ^^^^ 

where we use the fact that the average output power E[sgn(X + Z)]^ is unity. Now in order to 
facilitate the evaluation of the expectation in the numerator in ([T?] ), we introduce the "partial 
mean" of the random variable X ~ ?v[(0, S^) 

F{z) = r -^e-^dx = l/l^exp ( , (16) 



V 27r " V 2£ 

which is an even function of 2; G (—00,00). We denote by pxix) and pz{z) the probability 
density functions of X ~ IN"(0, £5) and Z ~ !}\f(0, a^), respectively, and proceed as 



xpx{x)pz{z)dxdz 

+z<0 



E[X • sgn(X + Z)] = / / xpx{x)pz{z)dxdz -II 

JJx+z>0 JJx 

2// xpxix)pz{z)dxdz = 2 pz{z)F{-z)dz = Esi ,„ ^, — ^. (17) 

JJx+z>0 J -00 \/7r(£, + (T^) 
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This leads to 



A = -^^t^ = -#^, (18) 



Es 7r(£s + cr2)' 

and 

A 2£ 



SNR, = = ^ . (19) 

1 — A (vr — ZjCs + vrcr^ 



So we get the following asymptotic behavior: 
. High SNR: When SNR = E./ct^ oo, 



^^^^ - ^-(^sM + ^^sM)' ^''^ 
^^^^ = ^^^s^-^sM + '^Mr)- ^^^^ 



. Low SNR: When SNR ^ 0, 



SNRe = -SNR - , ^^ SNR^ + o(SNR^), (22) 
/gmi = -SNR- ^SNR2 + o(SNR^). (23) 

We make two observations. First, at high SNR, the GMI converges to 0.7302 bits/c.u., strictly 
less than the limit of the channel capacity, 1 bit/c.u., revealing that the suboptimal Gaussian 
codebook ensemble leads to non-negligible penalty when the effect of distortion is dominant. 
Second, at low SNR, the ratio between the GMI and the SNR converges to I/tt, and thus 
asymptotically coincides with the behavior of the channel capacity [4] . Intuitively, this is because 
in the low-SNR regime the effect of noise is dominant, and thus the channel is approximately still 



Gaussian. In Figure [2]we plot the GMI /qmi and the channel capacity C = 1 — //2(Q(a/£s7^)) 
[5] versus SNR. The different behaviors of the GMI in the two regimes are evident in the figure. 

V. Channels with Multi-Bit Output Quantization 

In this section, we continue the exploration of output quantization and consider specifically 
the scenario where the channel output Y passes through a 2M-level symmetric quantizer, as 

w = f{x + z) = Ti ■ sgn(x + z) if |x + zl G [Qi-i, di), (24) 

for i = 1,...,M, where ao = < ai < . . . < aM = oo. The parameters include the 
reconstruction points {ri, . . . , tm}, and the quantization thresholds {ai, . . . , aM-i}- Note that 
with 2M levels, the quantizer bit-width is (log2 M + 1) bits. 
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For a 2M-level symmetric quantizer, we can evaluate that (see Supplementary Material [VII-DI) 



M 



E[/(X + Z)]^ = 2^. 



Q 



where the Q-function is Q{z) = e'^^^'^dx, and 



Q 



(25) 



E[/(X + Z)X] = £, 



TT 



■ M 

E 

i=l 



(26) 



To further simplify the notation, define (^(-z) = /o^("~log^) ^^^c^a; for 2; G [0, 1)E and 
introduce U = e 2(Z^+^ for i = 0, 1, . . . , M with to = 1 > ^i > • • • > = 0. We thus can 
rewrite 



M 



i=l 



E[/(X + Z)p = 2j2rnQ{U^i) - QM 
E[/(X + Z)X] = £ 



M 



71 



(£, + ^2) ^ 

^ ' 1=1 



(27) 
(28) 



These lead to 



A 



(29) 



In (l29l) . the second term is independent of the SNR, and can be optimized separately. Let us 
denote this term by A^^,t, and write A = J^^^_^^2y We consequently have the following effective 
SNR: 



SNR, 

High SNR: When SNR 00, 

SNR, -- 



(vr - Kr,t)Es + TTCr 
Kr,t Krt-K 1 



2 ■ 



TT-Kw r7r-irw)2SNR^^^SNR^' 



-^GMI 

. Low SNR: When SNR 

SNRe : 



1 vr 
log 



r,t 



+ o(- 



2 "n-Krt 2(7t-Krt)SNR 'SNR 



^-'-SNR ^^'^'-^ ~ ^^'^^ g^Tp2 , „^oATO2^ 



TT 



^SNR' + o(SNR' 



'^SNR - 



2% 



^^dl^lLil^lsNR^ + o(SNR2). 



(30) 

(31) 
(32) 

(33) 
(34) 



^We have Q{z) = 0(^-2 log z) = (1/2) ■ erfc(V- log z). 
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It is thus apparent that the value of determines the system performance, for all SNR. We 
hence seek to maximize 



r,t 



(35) 



EfUrflQiU-i) - QiU)]' 
where to = 1 > ^i > • • • > = and > for alH = 1, . . . , M. 

Taking the partial derivatives of K^^t with respect to rj, z = 1, ... , M, and enforcing them to 
vanish, we have that the following set of equations needs to hold for maximizing Kr,p 



i = l,...,M. (36) 



Substituting these {rj} into Kr,t and simplifying the resulting expression, we obtain 



Kt = maixKr t = V ~ ^^'^^ — • (37) 

That is, the optimal quantizer design should solve the following maximization problem: 

max V , ^ ' ^ y , s.t. to = 1 > ii > • • • > iA/ = 0. (38) 



Example: Fine quantization, maxj=i ... ^(i^j-i — tj) — )• 

In this case, the following approximation becomes accurate: 

g(ti_i) - g(ti) 



tj„i — t. 

So the resulting Kt behaves like 

M 



Q'iU.i), V^ = 1,...,M. (39) 



Kt = y J^^^i^^^ f'J-dt 

QiU-i) - QiU) Jo Q'it) 



2v^^ log tdt = 2v^ j 



oo 



y^e-y dy = n. (40) 

'0 

Therefore, as the quantization goes fine asymptotically, the effective SNR as given by (l30l) 
approaches the actual SNR, and thus the performance loss due to quantization eventually dimin- 
ishes. 

Example: 4-level quantization, M = 2 

In this case, there is only one variable, t = ti, to optimize. The maximization problem becomes 

max ^ 1 h ^3 — . (41) 

t6(o,i) 1/2 - g(t) g(t) 
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A numerical computation immediately gives maxtg(o,i) Kt = 2.7775, and interestingly, the 
maximizing t = 0.618 is the golden ratio. 
Example: Uniform quantization 

In practical systems, uniform quantization is common, in which the thresholds satisfy = 
i^j2{Es + a'^)a for i = 0,1,...,M — 1, and um = oo, where a > is a parameter for 
optimization. These thresholds lead to 

M-l 



-(«— l)^a Q—i'^oi 



g-2(M-l)^a 

Kt=y ^ + ^ , (42) 

- ^ Q(y2^(z - 1)) - g(v^2) Q(V2^(M-1))' 

which can be further maximized over a > 0. 

In Table H we list the numerical results for optimizing Kt over a, up until M = 8. 
Example: t-uniform quantization 

An alternative quantizer design is to let the values of t be uniformly placed within [0, 1], i.e., 
ti = (M — i) /M, for z = 0, 1, . . . , M. This quantization leads to 

M 

Kt = T7^y-~ -~ . (43) 

- M2^g(l-(^-l)/M)-g(l-VM) 

In Table [III we list the numerical results of Kt for t-uniform quantizers, up until M = 
8. We notice that the t-uniform quantization is consistently inferior to the optimized uniform 
quantization. 

Example: Optimal quantization 

We can also develop program to numerically solve the optimization problem (|38l) . In Table 
Uni we list the results, up until M = 8. We also list the value of the optimal ti, from which 
we can recursively compute the whole optimal t vector, through enforcing the partial derivatives 
dKt/dti to vanish for z = 2, . . . , M — 1 progressively. 

From the numerical results in the above examples, we observe that the GMI may be fairly close 
to the channel capacity at low SNR. For example, with the optimal octal quantizer (M = 4), the 
low-SNR GMI scales with SNR like 0.4827- SNR bits/c.u., which is better than the known lower 
bound 0.475 ■ SNR bits/c.u. in the literature [4]. In Figure [3] we plot the GMI Jgmi achieved 
by the optimal quantizers, for M = 2, 3, . . . , 8. For comparison we also plot in dash-dot curve 
the capacity (l/2)log2(l + SNR) of undistorted channels. We can roughly assess that, with 
M = 4 {i.e., 3-bit quantization), the performance gap between the GMI and the undistorted 
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channel capacity is mild up until SNR ^ 10 dB; and with M = 8 (i.e., 4-bit quantization), 
the performance gap is mild up until SNR ^ 15 dB. Compared with the numerically evaluated 
capacity for 2/3-bit quantization in [5], we see that using the Gaussian codebook ensemble and 
the nearest-neighbor decoding rule induce a 15-25% rate loss at high SNR. Comparing Tables 
U and Uni we further notice that the performance loss due to using uniform quantization is 
essentially negligible. 

Remark on Possible Connection with Capacity per Unit Cost: For a given 2M-level symmetric 
quantizer, we can evaluate the channel capacity per unit cost (symbol energy in our context) by 
optimizing a single nonzero input symbol, x (see [12]). Without loss of generality, we let x > 
and the noise variance be unity. Then the capacity per unit cost can be evaluated as 

- x) - Q{ai - x) 



sup 

x>0 X 



-y 



1=1 



-x) - Q{ai - x)) log 



+{Q{ai^i +x) - Q{ai + x)) log 



Q{ai^i) - Q{ai) 
Q{ai^i +x) - Q{ai + x) 



(44) 



Q{ai-i) - Q{ai) 

With some manipulations, we find that KJ (27r) is exactly the limit value of the term in (|44l) as 
X Oo Therefore, only if the capacity per unit cost (|44l) is achieved by x — t- 0, the GMI coincides 
with the channel capacity in the low-SNR limit. Unfortunately, as revealed by our numerical 
experiments, this condition does not generally hold for all possible symmetric quantizers. 

VL Super-Nyquist Output Sampling 

In this section, we examine the scenario where we sample the channel output at a rate higher 
than the Nyquist rate, and investigate the benefit of increased sampling rates in terms of the 
GMI. 

We start with a continuous-time baseband model in which the transmitted signal is 



wp'^i'-^)- '''' 



x{t) = ^ 

V2W 

where g{-) is a pulse function with unit energy and is band limited within W Hz. In analysis. 



a commonly used pulse function is the sine function g(t) = y/2Wsmc(2Wt) with sinc(t) = 
sin(7rt)/(7rt), which vanishes at the Nyquist sampling time instants t = {k / {2W)}'^^_^. The 

''This is also half of the Fisher information for estimating X = from the quantized channel output VV [12]. 
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channel input is a sequence of independent ?\f(0, £s) random variables {Xf,}'^^^. With additive 
white Gaussian noise z(t), the received signal is 



y{t)=x{t)+zit). 



(46) 



We assume that z{t) is band-limited within W Hz, with in-band two-sided power spectral density 



So the autocorrelation function of z{t) is Kz{t) 



-sinc(2m). 



We consider a uniform sampler, which samples the channel output y(t) at L times the Nyquist 
rate. For the k-th input symbol, the sampling time instants thus are 



k I 

+ 



2{L-1) 



Here, tl is a constant offset to ensure that the sampling times are symmetric with respect to the 
center of the A;-th input symbol pulse; for example, ri = (Nyquist sampling), T2 = 1/{4:W), 
ra = 1/{3W), n = 3/{8W)... Generally, tl = ^t^. Thus we can rewrite (gV]) as 

1 



L 2W 
L-l 



(48) 



l=-L+l 

Denote the output samples by {Y^ with ^ = y{tk,i) where tk,i = 2^(fc + l/L). The samples 
pass through a nonlinear distortion device, so that the observed samples are Wk,i = fiyk,i)- 

Let us generalize the nearest-neighbor decoding rule in Section |n] as follows. For all possible 
messages, the decoder computes the distance metrics. 



^ n L— 1 

D{m) = - ii[wk,i - aiXk{m)f, m G M, 



(49) 



k=l l=-L+l 



where and {ai}i^^^j^^ are weighting coefficients, and decides the received message 

as m = argmin^eM D{m). We then note that 



L-l 



L-l 



L-l 



L-l 



l=-L+l 
L-l 



l=-L+l 



\l=-L+l 



1(^1 



Xk(m 



s-^L-l 
2^l=-L 



.1 iia'iWk,i 



L-l 



+ 



l=-L+l 
L-l 



l=-L+l 

-L+1 



l=-L+l 



E 



l=-L+l 



Therefore, without loss of generality, we may consider the simplified nearest-neighbor decoding 
distance metric 

n 

^ n L—1 

- PiWk,i - Xk{m) , (50) 



Dim) 



k=l 



.l=-L+l 
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for which the tunable weighting coefficients are /3 = {A}z^_^l+i- 

Following the same procedure as that in Section |n] for the Nyquist-sampled channel model, 
we first examine the limit of -D(l) assuming that the message m = 1 is sent. Since the channel 
input symbols X. are i.i.d. and the noise process is wide-sense stationary, the observed samples 
Wfc_i constitute an ergodic processjfl Consequently, we have 

L-l 

J2 AWo,;-Xo 



lim D{1] 

n— >oo 



E 



U=-L+l 

On the other hand, for any m 7^ 1, we have 



a.s. 



(51) 



-An(n6 
n 



- log E I e^^^=ii^^— ^+1 ftw,,,-x,(m) 
n I 

-I 2 



L-l 



d 



l-2dE,n 



k=l 



L-l 



J2 z^'^'^.' 



.l='L+l 



-log(l-2^£,) 



1 -2^£, 



E 



L-l 



J2 f^i^o,i 



l=-L+l 
L-l 



--log(l-2e£. 



a.s. 



(52) 



In both limits above, {Woj}i^_^^^ axe induced by an infinite sequence of inputs, {X/,.}^ 
So the GMI is 



Jgmi = sup { 9E 

l3,e<o 



L-l 



Xn 



.l=-L+l 



9 



1-29^. 



-E 



L-l 



.l=-L+l 



■logfl -206, 



,(53) 



and we have the following result, whose derivation is given in Supplementary Material IVII-GI 
Proposition 2: The GMI with super-Nyquist output sampling is 



rGMi=2iog(i + Y^ 



(54) 



where A = (b^n-^b) /£„ Q is a (2L — 1) X (2L — 1) matrix with its (m, /)-element being 
E[Wo,uWo,;], and 6 is a (2L — 1) -dimensional vector with its /-element being E[XoWo,;], u,l = 
—L + 1, . . . , L — 1. To achieve the GMI in (|54|) . the optimal weighting coefficients are 



/3 



-Q-'b. 



(55) 



fn-^b 

We notice that the GMI in (l54l) is a natural extension of that in Proposition [T] for the Nyquist 
sampling case, and we can also define the effective SNR by SNRg = A/(l — A). 



^We note that the transmission of a codeword, {Xk}k^i, is finite-length. In order to meet the ergodicity condition, we may 
slightly modify the model by appending {Xk}1^_^ and {Xk}'kL„+i, which consist of i.i.d. 3sf(0, £s) random variables as 
additional interference, to the transmitted codeword. 
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A. Binary Symmetric Quantization: Sine Pulse Function 

We examine binary symmetric quantization in which w = sgn(?/). For this purpose, we need 
to evaluate O and b. For each /, 

Yo,-^f:x..(^-^)-.z(^). (56) 

Utilizing (flTI) and noting that {X^} are i.i.d., we have 

bi = E[Xosgn(Yo,z)] 

^s9{l/{2WL)) 



■-, (57) 

[(£./2) EZ-oo g'il/i^WL) - k/i2W)) + am/2] 
for / = -L + 1,...,L- 1. 

The undistorted received signal samples, Yo,„ and Yq^i, are jointly zero-mean Gaussian. We 
can further evaluate their correlation as 
E[Yo «Yo i] 

ru,i = 



var[Yo,„] ■ A/var[Yo,z] 

# Et-oo 9 il/{2WL) - k/i2W)) g {u/{2WL) - k/{2W)) + ^sinc ((/ - u)/L) 



jwEZ-oo9\l/mL) - k/{2W)) + "i^Jj^Y.Z^^9'iu/{2WL) - k/{2W)) + ^ 

Consequently, the correlation between the hard-limited samples is [13] 

2 

VLu,i = E[Wo,„Wo,i] = - arcsinr^,;. (58) 

vr 

Now in this subsection we focus on the sine pulse function, g{t) = \/2Wsmc{2Wt). For this 
g{-), through (l57l) and (l58l) we have 

28.S sinc(//L) 



V^V(2£>')2(/,/) + r 



(59) 



(2£>^)S(/,n) + sinc(^) 
V(2£>')S(/,0 + V(2£>2)s(n,n) + l' 
where -u) = X]fcL-oo ^^^^ (V-^ ~ ^) sine {u/L — k), which can be further evaluated as u) = 
sinc((/ — u)/L), for all l,u = —L + 1, . . . , L — 1. So 



bi = J-^J 2g^/ff2^+ 1 ^^^^ ' '^"'^ " ^^^^ (~T~ ' ■ ^^^^ 



When L = 1, i.e., Nyquist sampling, we can easily verify that A = ~ g +a^/2 ' '•^^^ revisiting 
the result in Section |lVl 
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From the above, we can find the following behavior of the GMI, in which we denote SNR 

}s I, 



60 = [sine (V^)]/=-L+i,...,L-i' and fto = [arcsin sine ((/ - 



andSNR.^,^J&.SN^_ ^,2) 



SNR+1-" " ' {l-b^n^%)-SNR+l' 

High-SNR regime: As SNR 00 

/GMi = ^log (--,;— ^1+0(1). (63) 



2 °Vl-&^f^o\ 
. Low-SNR regime: As SNR 0, 

/gmi = ^°^^SNR + o(SNR). (64) 

In Table |IVl we present the numerical results for the asymptotic behavior of the GMI, for 
different values of L. From the numerical results, we see that super-Nyquist sampling yields 
noticeable benefit for the GMI. In the low-SNR regime, sampling at twice the Nyquist rate attains 
limsNR-s-o -^GMi/SNR = 0.3587, which is slightly smaller than the lower bound 0.3732 which 
has been obtained in [7]. In the high-SNR regime, we further observe that for L > 4 the GMI 
exceeds 1 bit/c.u.! Intuitively, this is due to the fact that the diversity yielded by super-Nyquist 
sampling is capable of exploiting the abundant information carried by the Gaussian codebook 
ensemble. 

To further consolidate our above analysis, in Figure |4] we plot the GMI achieved for different 
values of L. We can clearly observe the rate gain by increasing the sampling rate. For comparison, 
we also plot the AWGN capacity without distortion and the capacity under binary symmetric 
quantization and with Nyquist sampling [5]. We notice that, as L increases, on one hand, the 
performance gap between the GMI and the capacity tends to diminish for SNR smaller than 
dB; on the other hand, the GMI even outperforms the capacity at high SNR. 

B. Binary Symmetric Quantization: Pulse Function Optimization at Low SNR 

We have already seen in the previous subsection that super-Nyquist sampling yields noticeable 
benefit. In this subsection, we illustrate that we can even realize additional benefit through 
optimizing the pulse function g{-). 

With sampling factor L, we restrict the pulse function to take the following form 

L-l 

g{t)= J2 lvV2Wsmc{2Wt-v/Ly, (65) 

v=-L+l 
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that is, a superposition of 2L — 1 (time- shifted) sine pulses. The weighting parameters {^v}^=-l+i 
are such that the energy of g{t) is unity, i.e., 

fv — v'\ 

g^{t)dt = ^ ^ 7i>7t;'sinc f — - — J = 1, (66) 

which may be rewritten in matrix form as 7'^07 = 1, where = [sine {{I — u)/ L)]^ 
If we let 7o = 1 and 7„^o = 0, we obtain the sine pulse function. 

Through the general formulas (1571) and (|58l) , we have, after some algebraic manipulation. 



L-l 



- ^ 7^sinc i -J— j , (67) 



_ (2£./^^) Et'-L+i EL'-l+i 7a7.sinc (^^) + sine (^) 

2£s/o- + 1 

To illustrate the benefit of optimizing the pulse function, we focus on the low-SNR regime, 
where SNR = ^§^2 approaches toward zero. We thus have 



L-l 



Subsequently, the value of A and SNRg in Proposition [2] behaves like 

lim lim -A^ = 7^01^-107, (70) 

SNR^O SNR SNR^O SNR ^ ^' 

where = [sinc((Z-M)/L)]^ ,^_^^^^ and Qq = [arcsin sine ((Z - 

have been defined previously. Keeping in mind the unit-energy constraint on g(t), the following 

optimization problem is immediate, 

max7^0rio^07, s.t. 7^07 = 1. (71) 

By noting that is a positive-definite matrix, we can introduce the transform 7 = 0^/^7, and 
rewrite the optimization problem as 

7^0V^»o^0V^7 
max= :yfrz =, (72) 

for which the maximum value is the largest eigenvalue of 0^/^f2Q ^0^/^, and the optimal 7 is 
the unit-norm eigenvector corresponding to the largest eigenvalue. 

In Table |Vl we present the numerical results for the low-SNR asymptotic behavior of the 
GMI, with the optimal choice of 7, for different values of L. Compared with Table |IVl we 
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notice that optimizing the pulse function leads to a noticeable additional improvement on the 
GMI. In particular, for L = 2 our approach yields limsNR-s-o -^gmi/SNR = 0.3731, which almost 
coincides with the result in [7], O.3732I] 

VII. Conclusions 

With the surging quest for energy-efficient communication solutions, transceivers with delib- 
erately engineered distortions have attracted much attention in system design. These distortions, 
such as transmit-side clipping and low-precision receive-side quantization, may significantly alle- 
viate power consumption and hardware cost. It is thus imperative for communication engineers 
to develop a systematic understanding of the impact of these distortions, so as to assess the 
resulting system performance, and to guide the design of distortion mechanisms. In this paper, 
we make an initial attempt at this goal, developing a general analytical framework for evaluating 
the achievable information rates using the measure of GMI, and illustrating the application of 
this framework by examining several representative transceiver distortion models. We hope that 
both the framework and the applications presented in this paper will be useful for deepening our 
understanding in this area. 

Admittedly, the approach taken in this paper, namely evaluating the GMI for Gaussian code- 
book ensemble and nearest-neighbor decoding, is inherently suboptimal for general transceiver 
distortion models. Nevertheless, as illustrated throughout this paper, the general analytical frame- 
work built upon such an approach is convenient for performance evaluation and instrumental 
for system design. In many practically important scenarios, for example the low/moderate- 
SNR regime, this approach leads to near-optimal performance. Furthermore, as suggested by 
our analysis of super-Nyquist sampling, we can substantially alleviate the performance loss by 
sampling the channel output at rates higher than the Nyquist rate. 

A number of interesting problems remain unsolved within the scope of this paper. These 
include, among others: answering whether the GMI coincides with the channel capacity for 
multi-bit output quantization in the low-SNR limit; identifying more effective ways of processing 
the samples in super-Nyquist sampled channels; characterizing the ultimate performance limit of 

^Since both our result and that in [7] are analytical, we have compared their values in fine precision and found that they are 
indeed different. 
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super-Nyquist sampling. Beyond the scope of this paper, one can readily see a whole agenda of 
research on communication with nonlinear transceiver distortion, including timing recovery, chan- 
nel estimation, equalization, transmission under multipath fading, and multiantenna/multiuser 
aspects. 
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Supplementary Material 
A. Derivation of the GMI in Proposition [7] 

We proceed starting from Q as follows. For any m ^ 1, 

E { e"^^^'") \Wk,k = l,...,n}=E[ e^EIJ.iIw.-aX.MP \Wk,k = l,...,n 

^13^ ■ 



:i - 2^a2£ N-n/2 



exp 



by noting that conditioned upon W., (W. — aX.)^ is a noncentral chi-square random variable. 
This leads to 

„ n 

K{ne) = logE {e"'^^^"^)! Wfe, k = l,...,n} = -^—^^-^^J^Wl - |log(l - 2ea'S,s). (74) 

fc=i 

Consequently, from the law of large numbers, 

A{e) = lim -A„M) = ^^^^'f/ - J log(l - 2ea'E,) a.s. (75) 
where X ~ X(0, £s) and Z ~ [N"(0, a^). So we can evaluate the GMI through 

/gmi= sup 1 gE {[/(X, Z) - aXf] - ^^^^'f/ + ^ log(l - 2ga^£,) | . (76) 
aeM,9<o \ -L — 2Ua^ts I I 

Note that in the problem formulation we include the optimization of /gmi over a G M. 

To solve the optimization problem, we define 

J{a,e) = 0E{[/(X,Z)-aX]^}-^^i^^^ + ilog(l-20a%) 

= {E[/(X, Z)f + a^£. - 2aE [/(X, Z)X] } - 71^^g;fg|' + ^ log(l - 2^a2£.) 

= ea^E, + - log(l - 2^a2£,) - __^E[/(X, Z)f - 2eaE [/(X, Z)X] . (77) 
By introducing the new variable 7 = —29a^8,s > 0, we rewrite J(a, 9) as 



J(7, 6) = - log(l + 7) - ^ + Yt^E[/(X, Z)]^ + ^ -^E [|/(X, Z)X|] . (78) 



Letting the partial derivative dJ/dO be zero, we find that the optimal value of < should 

be 

( 1 + 7)E[|/(X,Z)X| ] 
E[/(X,Z)]2v/2£;^ 



-0opt = -i',. (79) 
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Substituting 6'opt into J(7, 6) followed by some algebraic manipulation, we obtain 

T( n ^ ^ ^ ^ ^ ^ (1 + 7) {E[/(X,Z)X]}^ 
Al, ^opO = 2 log(l + 7) - 2 + 2£.E[/(X,ZF " ^^^^ 

Let us define 

^ {E[/(X,Z)X]}^ 

£.E[/(X,Z)]2 ' ^''^ 

and maximize J(7, 6'opt) = | log(l + 7) — | + (1 + 7)^ over 7 > 0. From Cauchy-Schwartz 

inequality, we see that A is upper bounded by one. It is then straightforward to show that the 

optimal value of 7 is 7opt = A/(l - A), and hence J(7opt, 6'opt) = -| log(l - A). 

Therefore, the maximum value J(7opt, 6^opt), i-^-, the GMI, is 

lGMi = hog(l + -r^] , (82) 



and the optimal choice of the decoding scaling parameter a is Oopt = E [/(X, Z)X] /8,s- 

B. Derivation of the GMI for Antipodal Codebook Ensemble 

We follow the same line of analysis as that for the Gaussian codebook ensemble. For m = 1, 

lim L)(l) = E|[W-aX]^} 

n— >-oo 

= E[W2] + a^£, - 2aE[WX] a.s. (83) 

where W = /(X, Z) denotes the distorted channel output. On the other hand, for any m 7^ 1, 
we find that 

1 6* " 1 " 

-AninO) = Wl + ea^8,s + log cosh{2ea^/^sWk), (84) 

k=l k=l 

and A{9) = lim -A„(ne) = ^EfW^] + ^a^E, + Elogcosh(2^av/£^W), a.s. (85) 

n— ^cxD n 

Consequently, we can evaluate the GMI by solving 

Jgmi= sup f-20aE[X/(X,Z)]-Elogcosh(2^ay£;/(X,Z))). (86) 

6»<0,aeK ^ ^ 

By letting —29a be a single variable t, we obtain the problem formulation as given by and 
by using the first derivative condition for optimality, we obtain the equation for the optimal value 
of t as given by (flOl) . 
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C. General Framework for Complex-Valued Nyquist-Sampled Channels 

We can extend the general GMI formula (|7]) for real-valued channels to complex-valued 
channels. Let the noise Z. be a sequence of i.i.d. circularly symmetric complex Gaussian random 
variables {i.e., Z. ~ C!K(0,cr^)). The memoryless nonlinearity mapping /(■) transforms (x, z) 
into a complex number /(x, z). Hence the observation is = /(X^, Zfc), for /c = 1, 2, . . . , n. 

For transmission, we restrict the codebook to be an i.i.d. C!N(0, £s) ensemble. The decoder 
follows a nearest-neighbor rule, which computes for all possible messages, the distance metric. 



Dim) 



n 

— / \wk — ciXk{m)\ , m G M, (87) 



n 

k=l 



and decides the received message as m = arg min^gM -D(m). 

Analogously to the development for the real- valued channel model in Section |Ill we arrive at 



/gmi= sup ('eE{|/(x,z)-aX|'}-^^4rw^ + Mi-^l«re. 



(88) 



Note that in the problem formulation we include the optimization of Jgmi over a G C. 
Define the expression in the right-hand side of (f88l) as J(a, 6), which can be rewritten as 



J(a, e) = 0|ap£, + log(l - e\a\'e,) - ^''""'^'^^gyig'^^'' - m<^\^^ {e^¥(X, Z)X} , (89) 



where is the phase of a, and Ji denotes the real part of a complex number. By introducing the 
new variable 7 = — 6'|ap£s > 0, we further rewrite J{a, 6) as 



J(7, 9) = log(l + 7) - 7 + T^E|/(X, Z)|2 + 2 J -^3?E {e^<^/(X, Z)X} . (90) 

1 + 7 V £s 

Letting the partial derivative dJ/dO be zero, we find that the optimal value of 6' < should 

be 



_ (1 + 7)3^E {e^>/(X, Z)X} 
^ ""^^ E|/(X,Z)Pv/£;^ 



(91) 



Substituting 6'opt into J(7, 6) followed by some algebraic manipulation, we obtain 

;i + 7)J^E{e^X,Z)X}] 
£sE|/(X,Z 

Let us define 



J(7, 0, ^opt) = l0g(l + 7) - 7 + ^ C • (92) 



M(h) = -L i — ^ ' ^ -'^ (93) 

'^^'^^ £,E|/(X,Z)|2 ' ^^'^ 
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and maximize J(7, 0, ^^opt) = log(l + 7) — 7 + (1 + 7) A(0) over 7 > 0. It is straightforward to 
show that the optimal value of 7 is 7opt = t^^^J' hence J(7opt, 0, dopt) = — log(l — A(0)). 



It is clear that J(7opt, 0, dopt) is maximized by choosing (f) = 0opt = 
which maximizes A(0). Denote A(0opt) by Aopt, which is 

_ |E{/(X,Z)X}|^ 
£,E|/(X,Z)|2 • 

Therefore, the maximum value J(7opt, 0opt, 6'opt), ^'-e., the GMI, is 



arctanE{/(X,Z)X}, 



(94) 



'GMI 



J (7opt, 0opt, ^'opt) = log ( 1 + Y 



A, 



opt 



A 



opt 



log(l + SNRe), 



and the optimal choice of the decoding scaling parameter a is a^pt = E {/(X, Z)X}/£j 



(95) 



D. Derivation of Eqn. d25D and (Q6\ 



E[/(X + Z)]2 = 25^// r,^px(x)pz(^)rfxrfz 



M 



j=l 



M 



A/ 

E[/(X + Z)X]=2 5^ 



rixpx{x)pz{z)dxdz 



ai_i<x+2<ai 



M 



/OO / POi—Z 

Pz{z) / 



xpx{x)dx j dz 

/~ 00 POO 
pz{z)F{ai^i - z)dz - / pz{z)F{ai - z)dz 



M 



Q 



a,; 



7r(£, + a^) ^ 



A/ 



-4-1 



£. Nearest-Neighbor Decoding for Antipodal Input and Symmetric Output Quantizers 

For a given 2M-level symmetric quantizer, and for antipodal inputs, we can evaluate the GMI 
following the result in Section [III Denote the probability Pr[W = r^jX = -\/£7] by p-^"* and 
Pr[W = —ri\X = -\/£7| by p\^^; by symmetry, we have Pr[W = r^jX = — -\/£7| = p1 "* and 
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Pr[W = -r,|X = -v^ = and Pr[W = r^] = Pr[W = -r^] = + ^)/2. The GMI 
thus is 



.(+) , J-)^ 



M 



/gmi = sup tv^5Z(^J^^ ~ ^)'^^ ~ + Pi ^) log cosh (tv^rO . 



(96) 



1=1 



i=l 



Maximizing GMI with respect to the reconstruction points r, we have that the optimal r satisfies 



1 fp(+)_p(-y 

— =artanh -^^r^ r ^ 



1 1 

log 



{+) 



P. 



{-)' 



1,...,M, 



(97) 



and that the GMI further reduces into 



'GMI 



M 

E 

i=l 



P1-—AI log + + p(-)) log 2 - ipr' + pf>) log 

Pi 



.(+) , J-h 



P. 



(+) 



+ 



(-) 



P. 



(+) 



A/ 



log 2 - 5^ I {pt^ + pS')) iog(p;+) + p^-^' 



Pi^^logpS^^ -pj ^logpS ^ 



/(X; W).(98) 



That is, the GMI coincides with the channel input-output mutual information, which is achievable 
by maximum-likelihood decoding. This seemingly surprising result is in fact reasonable, because 
there is indeed a nearest-neighbor decoding realization of the maximum-likelihood decoding 
rule, when the channel input is antipodal and the output quantization is symmetric. Choosing 



(+) iJ~h 



the reconstruction points as = log[p- /p. 
we can write the nearest-neighbor decoding metric as 

1 " 

Dim) = -V 



M, and denoting Wk by rw^-sgn{wk). 



k=l 



1 " 



log -j^sgn{wk) - Xk{m) 

Prwi. 



p^'^k 



£, V loj 

n ^-^ 



P 



k=l 



P 



(+) 

(-) 



sgn(u;fcjXfc(mj 



(99) 



The first two terms in (I99|) are independent of the codeword, and thus it suffices to examine 



1 n (+) 

Di{m) = -2_^\og-^sgn{wk)xk{m 



(100) 



which can be further equivalently deduced into 



1 " 



k=l 
n 



log -^sgn{wkXk{m)) + log(p(+)p(^^) 

Prwt. 



2nve 

^ ^logPr[w;fc|xfc(m)] 



(101) 



k=l 



identical to the metric in maximum- likelihood decoding. 
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F. Super-Nyquist Output Sampling with Antipodal Inputs 

We examine the scenario where the input is antipodal, and where the decoder follows the 
linearly weighted nearest-neighbor decoding rule: 



n 

Dim) = - V 

k=l 



n 2 



(=0 



m e M. 



(102) 



Following the same line of analysis as that for the Gaussian codebook ensemble, we have, for 

m = 1, 



lim D{1) = E 



L-l 



1=0 



a.s. 



(103) 



and for any m ^ 1, 

1 

n— >-oo J7, 



AW 



lim — A„(n6 



. «=0 



^£., + E 



L-l 



logcosh(2^^y£;5^AWo, 



1=0 



a.s. (104) 



where {Wo,/};^q^ are induced by an infinite sequence of inputs, {Xk}^^_^. Through some 
manipulations, we thus obtain the resulting GMI as 



^GMi = sup <( E 

/3 



L-l 



1=0 



E 



L-l 



logcosh(y£;^AWo,z) 



Consequently, the optimal choice of the weighting coefficients, (3, obeys 



E 



j=0 



E 



XoWo, 



1 = 0,1,. ..,L-1, 



(105) 



(106) 



which constitute an array of transcendental equations. 

We further focus on the special case of binary symmetric quantizer w = sgn{x + z) and L = 2. 
From the symmetry in the setup, we see that /3o = (3i = (3, and we only need to solve a single 
equation: 

1 



E[Wo,o ■ tanh( V£,/3(Wo,o + Wo,i))] 



E[XoWo,o]. 



(107) 

1,-1)] = r,. 



For convenience, we denote Pr[(Wo,o, Wo,i) = (1,1)] = Pr[(Wo,o, Wo,i 
Pr[(Wo,o,Wo,i) = (1,-1)] = Pr[(Wo,o,Wo,i) = (-1,1)] = 1/2 - r/, and Pr[Wo,o = l|Xo 
y/BTg] = 1^- So (11071) becomes 

2k - 1 



tanh(2V£,/3) 



27] 



4v/£: 



2{'q-K) + l 



(108) 
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G. Derivation of the GMI in Proposition \2\ 

Denoting the expression in the right-hand side of (l53l) by J{/3,6), and enforcing its partial 
derivatives with respect to {l3i}jz\_^_i to vanish, we have 

26 



dJ_ 

Wi 



29E 



L-l 



J2 /3«Wo,„-Xo Wo,i 



-L+l 



J2 /3«E[Wo,„Wo,,] 



1 - 20£, 
E[XoWo,] 



-E 



L-l 



J2 i^uyvo,u]yvo,i 



\u=-L+l 







(109) 



u=-L+l 

for / = —L + 1, . . . , L — 1. Summarizing these 2L — 1 equations, we can write them collectively 
as 



0/3 = 1 - 



1 



298,. 



(110) 



where 17 is a (2L — 1) x (2L — 1) matrix with its (u, /)-element being E[Wo,„Wo,i], and 6 is a 
(2L — 1) -dimensional vector with its /-element being E[XoWo z]. Hence we have 



(3= 1 



298. 



(111) 



Substituting (|llll) into J{(3,9), we get 

32 n L-l L-l 



J {13, 9) 



29'8. 



298, - 



L-l L-l L-l 



298. 



l=-L+l u=-L+l 



l=-L+l 



98., 



^-9jt'n-'b+hogil- 298s). 



From (fTT2l) . we maximize J {13, 9) by letting 



1 - 298, 



8., 



8s-fn-^h' 

and the maximum value of J{[3,9), i.e., the GMI, is 

-^GMI 



1, A h'n-^hi8s \ 



(112) 



(113) 



(114) 
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Fig. 1. Illustration of the general channel model with distortion. 
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Fig. 2. The GMI and the channel capacity of the real Gaussian channel with binary symmetric output quantization. 
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AWGN capacity 
w/o distortion y 



3.5 - 




SNR (dB) 

Fig. 3. The GMI achieved by optimal 2M-level quantizers, for M = 2, 3, . . . , 8. 



1 .5 r 

AWGN capacity/ 
w/o distortion / 
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Fig. 4. The GMI achieved by super-Nyquist sampling with binary symmetric quantization and sine pulse function, for L = 
1,2,4,8, 16. 
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M 


2 


3 


4 


5 


6 


7 


8 


max,:, -Ki 


2.7725 


2.9569 


3.0291 


3.0651 


3.0858 


3.0989 


3.1077 


optimal a 


0.481 


0.253 


0.159 


0.111 


0.082 


0.064 


0.051 



TABLE I 

Table of performance for optimized uniform 2M-level symmetric output quantization. 



M 


2 


3 


4 


5 


6 


7 


8 


Kt 


2.7488 


2.9267 


3.0011 


3.0404 


3.0642 


3.0798 


3.0908 



TABLE n 

Table of performance for t-UNiFORM 2M-level symmetric output quantization. 



M 


2 


3 


4 


5 


6 


7 


8 


maxt ift 


2.7725 


2.9595 


3.0330 


3.0695 


3.0902 


3.1032 


3.1117 


optimal ti 


0.618 


0.805 


0.880 


0.922 


0.943 


0.958 


0.967 



TABLE m 

Table of performance for optimal 2M-level symmetric output quantization. 



L 


1 


2 


4 


8 


16 


32 


oo 




2/-K 


0.7173 


0.7591 


0.7734 


0.7783 


0.7801 


0.7815 


limsNR-s-oo -^GMi (bits/c.u.) 


0.7302 


0.9114 


1.0268 


1.0710 


1.0867 


1.0926 


1.0970 


limsNR-).o /gmi/SNR 


0.3183 


0.3587 


0.3796 


0.3867 


0.3892 


0.3901 


0.3907 



TABLE IV 

Table of performance for super-Nyquist output sampling with binary symmetric quantization and sinc 

pulse function. 
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L 


2 


4 


8 


16 


32 


oo 


limsNR-t-o /gmi/SNR 


0.3731 


0.3923 


0.3971 


0.3984 


0.3987 


0.3988 



TABLE V 

Table of performance for super-Nyquist output sampling with binary symmetric quantization and 

optimized pulse function. 
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